-
Notifications
You must be signed in to change notification settings - Fork 916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile-time ipow computation with array lookup #15110
Compile-time ipow computation with array lookup #15110
Conversation
/ok to test |
/ok to test |
…into issue_9346 Update local branch with changes from remote
/ok to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
I've updated the PR description a bit so the corresponding issue can be closed once this PR gets merged. Thanks!
Use doxygen @note Co-authored-by: Yunsong Wang <[email protected]>
/ok to test |
Note that it's unclear (to me) what the optimal algorithm is for get_power(). It was previously the (logarithmic) squaring algorithm instead of this recursive one. However since we have to compute every power to fill the array, the compiler may be smart enough to optimize that, or benefit from caching effects. Either way this call is only performed at compile time (good candidate for consteval in C++20), and the compile time is dominated by the type dispatcher anyway. So we'll just use the recursive algorithm for now (simplest, perhaps easiest for compiler to optimize). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, thank you for adding the comment!
Valid point. I won't worry much about a build-time recursive call. |
/merge |
The addition of an array of integers in this function placed too much register pressure on our code base. This function is used by the fixed_point constructor and cast operators, so it potentially affects every kernel. Too many unrelated kernels were impacted and suffered performance degradations to justify this change. This reverts the algorithm introduced in #15110 to what it was previously, with some very minor tweaks. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Mike Wilson (https://github.com/hyperbolic2346) - Shruti Shivakumar (https://github.com/shrshi) - MithunR (https://github.com/mythrocks) URL: #15242
Description
Compile-time ipow() computation with array lookup. Results in up to 8% speed improvement for decimal64 -> double conversions. Improvement is negligible for other conversions but is not worse. New benchmark test will be in a separate PR. Fix fixed_point -> string conversion test. Also fix rounding comments. Closes #9346
Checklist